Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Vision graph neural networks (ViG) have demonstrated promise in vision tasks as a competitive alternative to conventional convolutional neural nets (CNN) and transformers (ViTs); however, common graph construction methods, such as k-nearest neighbor (KNN), can be expensive on larger images. While methods such as Sparse Vision Graph Attention (SVGA) have shown promise, SVGA’s fixed step scale can lead to over-squashing and missing multiple connections to gain the same information that could be gained from a long-range link. Through this observation, we propose a new graph construction method, Logarithmic Scalable Graph Construction (LSGC) to enhance performance by limiting the number of long-range links. To this end, we propose LogViG, a novel hybrid CNN-GNN model that utilizes LSGC. Furthermore, inspired by the successes of multiscale and high-resolution architectures, we introduce and apply a high-resolution branch and fuse features between our high-resolution and low-resolution branches for a multi-scale high-resolution Vision GNN network. Extensive experiments show that LogViG beats existing ViG, CNN, and ViT architectures in terms of accuracy, GMACs, and parameters on image classification and semantic segmentation tasks. Our smallest model, Ti-LogViG, achieves an average top-1 accuracy on ImageNet-1K of 79.9% with a standard deviation of ± 0.2%, 1.7% higher average accuracy than Vision GNN with a 24.3% reduction in parameters and 35.3% reduction in GMACs. Our work shows that leveraging long-range links in graph construction for ViGs through our proposed LSGC can exceed the performance of current state-of-the-art ViGs.more » « lessFree, publicly-accessible full text available December 13, 2026
-
This paper delves into the frequency analysis of image datasets and neural networks, particularly Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs), and reveals the alignment property between datasets and network architecture design. Our analysis suggests that the frequency statistics of image datasets and the learning behavior of neural networks are intertwined. Based on this observation, our main contribution consists of a new framework for network optimization that guides the design process by adjusting the network’s depth and width to align the frequency characteristics of untrained models with those of trained models. Our frequency analysis framework can be used to design better neural networks with better performance-model size trade-offs. Our results on ImageNet-1k image classification, CIFAR-100 image classification, and MS-COCO object detection and instance segmentation benchmarks show that our method is broadly applicable and can improve network architecture performance. Our investigation into the alignment between the frequency characteristics of image datasets and network architecture opens up a new direction in model analysis that can be used to design more efficient networks.more » « lessFree, publicly-accessible full text available June 9, 2026
-
Catastrophic forgetting is a significant challenge in online continual learning (OCL), especially for nonstationary data streams that do not have well-defined task boundaries. This challenge is exacerbated by the memory constraints and privacy concerns inherent in rehearsal buffers. To tackle catastrophic forgetting, in this paper, we introduce Online-LoRA, a novel framework for task-free OCL. Online-LoRA allows to finetune pre-trained Vision Transformer (ViT) models in real-time to address the limitations of rehearsal buffers and leverage pre-trained models’ performance benefits. As the main contribution, our approach features a novel online weight regularization strategy to identify and consolidate important model parameters. Moreover, Online-LoRA leverages the training dynamics of loss values to enable the automatic recognition of the data distribution shifts. Extensive experiments across many task-free OCL scenarios and benchmark datasets (including CIFAR-100, ImageNet-R, ImageNet-S, CUB-200 and CORe50) demonstrate that Online-LoRA can be robustly adapted to various ViT architectures, while achieving better performance compared to SOTA methods.more » « lessFree, publicly-accessible full text available February 26, 2026
-
Vision transformers (ViTs) have dominated computer vision in recent years. However, ViTs are computationally expensive and not well suited for mobile devices; this led to the prevalence of convolutional neural network (CNN) and ViT-based hybrid models for mobile vision applications. Recently, Vision GNN (ViG) and CNN hybrid models have also been proposed for mobile vision tasks. However, all of these methods remain slower compared to pure CNN-based models. In this work, we propose Multi-Level Dilated Convolutions to devise a purely CNN-based mobile backbone. Using Multi-Level Dilated Convolutions allows for a larger theoretical receptive field than standard convolutions. Different levels of dilation also allow for interactions between the short-range and long-range features in an image. Experiments show that our proposed model outperforms state-of-the-art (SOTA) mobile CNN, ViT, ViG, and hybrid architectures in terms of accuracy and/or speed on image classification, object detection, instance segmentation, and semantic segmentation. Our fastest model, RapidNet-Ti, achieves 76.3% top-1 accuracy on ImageNet-1K with 0.9 ms inference latency on an iPhone 13 mini NPU, which is faster and more accurate than MobileNetV2x1.4 (74.7% top-1 with 1.0 ms latency). Our work shows that pure CNN architectures can beat SOTA hybrid and ViT models in terms of accuracy and speed when designed properlymore » « lessFree, publicly-accessible full text available February 26, 2026
-
For rapidly spreading diseases where many cases show no symptoms, swift and effective contact tracing is essential. While exposure notification applications provide alerts on potential exposures, a fully automated system is needed to track the infectious transmission routes. To this end, our research leverages large-scale contact networks from real human mobility data to identify the path of transmission. More precisely, we introduce a new Infectious Path Centrality network metric that informs a graph learning edge classifier to identify important transmission events, achieving an F1-score of 94%. Additionally, we explore bidirectional contact tracing, which quarantines individuals both retroactively and proactively, and compare its effectiveness against traditional forward tracing, which only isolates individuals after testing positive. Our results indicate that when only 30% of symptomatic individuals are tested, bidirectional tracing can reduce infectious effective reproduction rate by 71%, thus significantly controlling the outbreak.more » « lessFree, publicly-accessible full text available January 1, 2026
-
Not AvailablDeploying monocular depth estimation on resource-constrained edge devices is a significant challenge, particularly when attempting to perform both training and inference concurrently. Current lightweight, self-supervised approaches typically rely on complex frameworks that are hard to implement and deploy in real-world settings. To address this gap, we introduce the first framework for Lightweight Training and Inference (LITI) that combines ready-to-deploy models with streamlined code and fully functional, parallel training and inference pipelines. Our experiments show various models being deployed for inference, training, or both inference and training, leveraging inputs from a real-time RGB camera sensor. Thus, our framework enables training and inference on resource-constrained edge devices for complex applications such as depth estimation.more » « lessFree, publicly-accessible full text available May 6, 2026
-
Federated continual learning is a decentralized approach that enables edge devices to continuously learn new data, mitigating catastrophic forgetting while collaboratively training a global model. However, existing state-of-the-art approaches in federated continual learning focus primarily on learning continuously to classify discrete sets of images, leaving dense regression tasks such as depth estimation unaddressed. Furthermore, autonomous agents that use depth estimation to explore dynamic indoor environments inevitably encounter spatial and temporal shifts in data distributions. These shifts trigger a phenomenon called spatio-temporal catastrophic forgetting, a more complex and challenging form of catastrophic forgetting. In this paper, we address the fundamental research question: “Can we mitigate spatiotemporal catastrophic forgetting in federated continual learning for depth estimation in dynamic indoor environments?”. To address this question, we propose Local Online and Continual Adaptation (LOCA), the first approach to address spatio-temporal catastrophic forgetting in dynamic indoor environments. LOCA relies on two key algorithmic innovations: online batch skipping and continual local aggregation. Our extensive experiments show that LOCA mitigates spatio-temporal catastrophic forgetting and improves global model performance, while running on-device up to 3.35× faster and consuming 3.13× less energy compared to state-of-the-art. Thus, LOCA lays the groundwork for scalable autonomous systems that adapt in real time to learn private and dynamic indoor environments.more » « lessFree, publicly-accessible full text available June 9, 2026
-
Batteryless sensing devices which rely on energy harvesting can enable more sustainable and long-lasting Internet of Things (IoT) based wearables. While it has become feasible to implement energy-harvesting based wearables for digital health applications, it remains challenging to integrate such devices and the data they collect into machine learning pipelines for tasks such as human activity recognition (HAR). A key obstacle is uncertainty in the data acquisition process. Given the discontinuous and uncertain availability of harvested energy, when should a sensor spend energy to sample and transmit data packets for processing? A common approach is to spend energy opportunistically by sending packets whenever sufficient energy is available. However, when considering a specific task, namely HAR with kinetic energy harvesting based sensors, this approach unfairly prioritizes data from activities where more energy can be harvested (e.g., running). In this work, we improve the opportunistic energy spending policy by pruning redundant packets to reallocate energy towards activities where less energy is harvested. Our approach results in an increase in the F1-score of ‘lower energy’ activities while having a minimal impact on the F1-score of ‘higher energy’ activities.more » « less
-
The effectiveness of digital contact tracing during extended outbreaks of airborne infectious diseases, such as COVID-19, influenza, or RSV, can be hindered by limited social compliance and delays in real-world testing. Prior work has shown the utility of graph learning for bidirectional contact tracing and multi-agent reinforcement learning (MARL) for disease mitigation; however, they rely on post-hoc analysis and full testing compliance, thus limiting real-time applicability. To address these limitations, we propose a new framework for online automated bidirectional contact tracing and disease-aware navigation. Our framework iteratively identifies infectious culprits, infers individual health statuses, and deploys agents to minimize infectious exposure without requiring Oracle health information. Our proposed framework achieves an average online backwards tracing F1-score of 92% and estimates the total case counts within 5% accuracy, even under conditions of probabilistic testing with significant social hesitancy. Additionally, our proposed agent-based navigation system can reduce the disease spread by 29%. These results demonstrate the framework’s potential to address critical gaps in traditional disease surveillance and mitigation models and improve real-time public health interventions.more » « lessFree, publicly-accessible full text available May 5, 2026
-
In deep learning (DL) based human activity recognition (HAR), sensor selection seeks to balance prediction accuracy and sensor utilization (how often a sensor is used). With advances in on-device inference, sensors have become tightly integrated with DL, often restricting access to the underlying model used. Given only sensor predictions, how can we derive a selection policy which does efficient classification while maximizing accuracy? We propose a cascaded inference approach which, given the prediction of any one sensor, determines whether to query all other sensors. Typically, cascades use a sequence of classifiers which terminate once the confidence of a classifier exceeds a threshold. However, a threshold-based policy for sensor selection may be suboptimal; we define a more general class of policies which can surpass the threshold. We extend to settings where little or no labeled data is available for tuning the policy. Our analysis is validated on three HAR datasets by improving upon the F1-score of a threshold policy across several utilization budgets. Overall, our work enables practical analytics for HAR by relaxing the requirement of labeled data for sensor selection and reducing sensor utilization to directly extend a sensor system’s lifetime.more » « less
An official website of the United States government
